SQL Server 2008 : Developing Custom Managed Database Objects (part 3) - Developing Managed User-Defined Functions

3/23/2011 9:20:41 AM

Developing Managed User-Defined Functions (UDFs)

Using SQL Server 2008 and the .NET Framework, you can write both scalar (single-valued) and table-valued user-defined functions in managed code. Scalar functions are the easier of the two, so we look at those first.

Scalar UDFs

In VS, right-click your SQLCLR project in the Solution Explorer and select Add, Add New Function. Next, name this new class XSLT, and, when it opens in the code editor, rename its default method to XSLTransform because that’s what it is going to do—transform the content of an xml-typed variable using XSLT, using a stylesheet also stored in an xml column.

The xml data type lets you take advantage of server-side storage of XML, and why not leverage that same technology to store XSLT stylesheets? You’ll have the assurance that before you save your XSLTs to your table, they are guaranteed to be well formed.

You need to add using statements to the newly created XSLT.cs file for the namespaces System.IO, System.Xml, and System.Xml.Xsl. You need System.IO to use the streams it offers (but not to write files), and you need the XslCompiledTransform object to perform the transformation.

Listing 2 shows the code of our new scalar function (note that we removed the namespace statement from XSLT.cs).

Listing 2. A Managed Scalar UDF for Transforming XML

using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.IO;
using System.Xml;
using System.Xml.Xsl;

public class XSLT
{
    [SqlFunction(
        DataAccess = DataAccessKind.None,
        IsDeterministic=false,
        IsPrecise=true,
        Name="clr_XSLTransform",
        SystemDataAccess=SystemDataAccessKind.None
    )]
    public static SqlXml XSLTransform(SqlXml InputXml, SqlXml XSLT)
    {
        MemoryStream ms = new MemoryStream();
        XslCompiledTransform xslcomp = new XslCompiledTransform(false);
        xslcomp.Load(XSLT.CreateReader());
        xslcomp.Transform(InputXml.CreateReader(), null, ms);
        ms.Seek(0, SeekOrigin.Begin);
        XmlTextReader xreader = new XmlTextReader(ms);
        return new SqlXml(xreader);
    }
};

Notice the use of the new SqlFunction attribute and its named parameter list. The implementation contract for managed scalar functions is just the same as for stored procedures: mark it as static and decorate it with the appropriate attribute.

The following named parameters are available for scalar UDFs:

DataAccess— Tells SQL Server whether the function will access user table data on the server in its body. If you provide the enum value DataAccessKind.None, some optimizations may be made.
SystemDataAccess— Tells SQL Server whether the function will access system table data on the server in its body. Again, if you provide the enum value SystemDataAccessKind.None, some optimizations may be made.
IsDeterministic— Tells SQL Server whether the function will always return the same values, given the same input parameters.
A common example of a nondeterministic function is GETDATE(), which always returns something different. A function is also said to be nondeterministic if any of the functions that it calls are nondeterministic. ISNUMERIC() is a good example of a deterministic function.
IsPrecise— Tells SQL Server whether the function does floating-point arithmetic (in which case, you provide the value false). Precise functions can be indexed; nonprecise functions cannot.
Name— Tells the deployment routine what to call the function when it is created in the database.

What’s neat about the code in Listing 2 is that it performs the entire XML transformation without using file I/O (except for the I/O required for SQL Server’s paging functionality). You should build and deploy the code in this listing to your instance of SQL Server.

To test this example, create the following table in AdventureWorks2008, which will hold XSLTs pertaining to the tables in the HumanResources schema:

CREATE TABLE HumanResources.XmlResources
(
    XmlResourceId int IDENTITY(1,1) PRIMARY KEY CLUSTERED,
    XmlResourceType int NOT NULL DEFAULT(1),
    XmlResourceName varchar(50) NOT NULL,
    XmlResource xml
)

You also need a stylesheet to test it. Listing 3 inserts into this table an XSLT that searches the XML in the Resume xml column of JobCandidate and pulls out the name and address information into a more simplified XML structure.

Listing 3. Inserting the XSLT Used by Our Managed Scalar UDF

INSERT HumanResources.XmlResources
SELECT
    1,
    'ResumeAddressTransformer',
    '<?xml version="1.0" ?>
<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:ns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/Resume"
    version="1.0">
    <xsl:template match="ns:Resume">
        <NameAndAddress>
            <xsl:apply-templates/>
        </NameAndAddress>
    </xsl:template>

    <xsl:template match="ns:Name">
        <Name>
            <xsl:value-of select="ns:Name.First"/>
            <xsl:text disable-output-escaping="yes">&#160;</xsl:text>
            <xsl:value-of select="ns:Name.Last"/>
        </Name>
    </xsl:template>

    <xsl:template match="ns:Address[ns:Addr.Type=''Home'']">
        <HomeAddress>
            <HomeStreet>
                <xsl:value-of select="ns:Addr.Street"/>
            </HomeStreet>
            <HomeCity>
                <xsl:value-of
select="ns:Addr.Location/ns:Location/ns:Loc.City"/>
            </HomeCity>
            <HomeState>
                <xsl:value-of
select="ns:Addr.Location/ns:Location/ns:Loc.State"/>
            </HomeState>
            <HomeZip>
                <xsl:value-of select="ns:Addr.PostalCode"/>
            </HomeZip>
        </HomeAddress>
    </xsl:template>
    <xsl:template match="node()"/>
</xsl:stylesheet>
'

Now that you have your new resource saved, you need to test it on some data. Copy the code from Listing 3 into a new SSMS query window and execute it. Deploy your assembly using VS. Then copy the code in Listing 4 into another SSMS query window and execute it, preferably using the results-to-text (Ctrl+T) option.

You should use results-to-text here because the code in the listing executes the UDF and then, using query(), performs an XQuery against the results of the transformation (itself an instance of the xml data type) to reformat the content as textual output for use by a mail merge program.

Listing 4. Running the XSLT UDF and Transforming the Output

Code View: Scroll / Show All

DECLARE @inputXML xml, @XSLT xml


SELECT @inputXML = Resume
FROM HumanResources.JobCandidate
WHERE Resume.exist('
    declare namespace
    ns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";
    /ns:Resume/ns:Name[ns:Name.First="Shai"]
') = 1

SELECT @XSLT = XmlResource
FROM HumanResources.XmlResources
WHERE XmlResourceName = 'ResumeAddressTransformer'

SELECT dbo.clr_XSLTransform(@inputXML, @XSLT).query('
    declare namespace
    ns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/Resume";

    text {(/NameAndAddress/Name/text())[1]},
    text {"&#10;"},
    text {(/NameAndAddress/HomeAddress/HomeStreet/text())[1]},
    text {"&#10;"},
    text {(/NameAndAddress/HomeAddress/HomeCity/text())[1]},
    text {", "},
    text {(/NameAndAddress/HomeAddress/HomeState/text())[1]},
    text {(/NameAndAddress/HomeAddress/HomeZip/text())[1]}
') AS StreetAddress
go
StreetAddress
-------------
Shai Bassli
567 3rd Ave
Saginaw, MI 53900
(1 row(s) affected)

Table-Valued UDFs (TVFs)

Like scalar UDFs, table-valued UDFs (TVFs) use the SqlFunction attribute, except that for TVFs, two additional named parameters are available:

TableDefinition— Because you’ll be returning a table, you need to tell the compiler what the schema of that table will be. TableDefinition takes a string that corresponds to the column definition list used in the CREATE TABLE statement (that is, ColumnName ColumnDataType Constraints (etc)).
FillRowMethodName— At execution time, each row in the returned table is represented in the class as an array of object (for example, in C#, Object[]). SQL Server needs to call a particular method of the TVF’s class on a per-row basis that takes an empty array of object and fills each value of the array with an appropriate column value for the current row.

As you may have already surmised, SQL Server relies quite a bit on the .NET interfaces IEnumerable and IEnumerator to build a table of rows.

The main method of any TVF must be decorated with the SqlFunction attribute and also must implement IEnumerable. This simply means it must provide a parameterless method called GetEnumerator() that returns an instance of an object that implements IEnumerator.

The object that implements IEnumerator in turn implements the MoveNext() and Reset() methods and the Current property. If you have used and implemented .NET Framework collections, this approach should seem straightforward.

It may be useful to think of SQL Server as the “user” that calls the implemented methods of the code. The reason is that the actual runtime caller only needs to specify the name of the SqlFunction object; the caller doesn’t need to know (or care) how things actually get called under the covers at runtime.

One thing on everyone’s wish list for T-SQL has always been the use of regular expressions because the LIKE operator just isn’t powerful enough for many matches. The code in Listing 5 contains a set of classes for a TVF that acts as a regular expression evaluator. It’s unique from many examples out there in a few respects:

It takes an input string, a user-defined type that represents a regular expression pattern (the RegexPattern UDF provides built-in pattern validation and storage), and an Int32 that represents the .NET Framework System.Text.RegularExpressions.RegexOptions enum.
It returns a two-column table of results, one row per match:
- The first column is an incremental ID for the match.
- The second column is an instance of the xml data type that contains the text of the match, the groups matched, and their respective captures.

The neat thing is that you can use this class just as you would the Regex.Match() method, options and all, and get a complete report of the matches on a per-match (think per-row) basis.

Tip

You need to add the C# RegexPattern struct to your assembly (covered in the next section on UDTs, in Listing 6 ) before you can deploy the code in Listing 46.5 to SQL Server using VS.

Listing 5. A Table-Valued UDF for Pattern Matching

using System;
using System.Data;
using System.Data.Sql;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;

//added
using System.Text.RegularExpressions;
using System.Collections;
using System.Xml;
using System.IO;
using System.Data.SqlClient;

public class RegexLibrary
{
  [Microsoft.SqlServer.Server.SqlFunction
  (
    IsDeterministic = true,
    IsPrecise = true,
    Name = "MatchAll",
    DataAccess = DataAccessKind.None,
    SystemDataAccess = SystemDataAccessKind.None,
    FillRowMethodName = "FillMatchAll",
    TableDefinition =
        @"MatchIndex int,
          GroupList xml"
  )]
  public static IEnumerable MatchAll(string Input,
      RegexPattern Expression, Int32 Options)
  {
    return new RegexReader(Input, Expression, Options);
  }

  public static void FillMatchAll(
      object row,
      out SqlInt32 MatchIndex,
      out SqlXml GroupList)
  {
    Object[] RowArray = (Object[])row;
    MatchIndex = (SqlInt32)RowArray[0];
    GroupList = (SqlXml)RowArray[1];
  }

  public class RegexReader : IEnumerable
  {
    public String input = string.Empty;
    public RegexPattern Expression;
    public Int32 Options = int.MinValue;

    public RegexReader(String Input, RegexPattern Expression, Int32 Options)
    {
      this.input = Input;
      this.Expression = Expression;
      this.Options = Options;
    }

    //Called by SS after initialization
    public IEnumerator GetEnumerator()
    {
      return new RegexEnumerator(this);
    }
  }

  public class RegexEnumerator : IEnumerator
  {
    private Regex _rex = null;
    private Match _match = null;
    private Object[] _current = null;
    private RegexReader _reader = null;
    private int _matchIndex = 0;

    public RegexEnumerator(RegexReader Reader)
    {
      _reader = Reader;
      Reset();
    }

    public void Reset()
    {
      _rex = null;
      _matchIndex = 0;
      _current = null;
      _rex = new Regex(_reader.Expression.ToString(),
        (RegexOptions)_reader.Options);
      _match = _rex.Match(_reader.input);
    }

    public bool MoveNext()
    {
      if (_match.Success)
      {
        _matchIndex++;
        _current = new Object[6];
        _current[0] = (SqlInt32)_matchIndex;

        string GroupList = @"<matchlog pattern='" + _rex.ToString() +
                  "' options='" + _rex.Options.ToString() + "' idx='" +
                  _matchIndex.ToString() +
                  "' matchtext='" + _match.ToString() + "'>";

        for (int g = 1; g < _match.Groups.Count; g++)
        {
          Group grp = _match.Groups[g];
          GroupList += "<group idx='" + g.ToString() +
                  "' text='" + grp + "'>";

          string CaptureList = string.Empty;
          CaptureCollection caps = grp.Captures;
          for (int c = 0; c < caps.Count; c++)
          {
            Capture cap = caps[c];
            CaptureList += "<capture idx='" + c + "' pos='" +
              cap.Index.ToString() + "' text='" + cap + "'/>";
          }

          GroupList += CaptureList + "</group>";
        }

        GroupList += "</matchlog>";
        _current[1] = new SqlXml(
                  new XmlTextReader(
                    new StringReader(GroupList)));
        _match = _match.NextMatch();
        return true;
      }
      else
      {
        return false;
      }
    }

    public Object Current
    {
      get {
        return _current;
      }
    }
  }
}

When MatchAll() is invoked, it returns an instance of the RegexReader class. In its constructor, RegexReader sets the passed-in regular expression, input string, and options to its data members. Then, at initialization time, SQL Server invokes RegexReader’s GetEnumerator() instance method, which returns an instance of RegexEnumerator, which does all the real work, utilizing the members of the RegexReader object that is passed into its constructor and set to its private _reader object.

Reset() is called in RegexEnumerator’s constructor so that it can initialize its members in the following way:

RegexEnumerator uses a private Regex object (_rex) for performing the match and stores the resulting array of Match (Match[]) in a private Regex.Match_match). object (
The ordinal number of the match is kept in _matchIndex and initialized to 0 (in case there are no matches).
When Reset() is complete, it is up to SQL Server to iterate through the matches by calling MoveNext().

MoveNext() does the work of re-creating the row (represented as a private array of object called _current) for every successful match stored in _match:

_match[0] is set to the value of _matchIndex (incremented on a per-match basis) and corresponds to the output table column (defined in the TableDefinitionMatchIndex. named parameter)
_match[1] is set to the value of an XML document that is built for every match and contains subnodes for each group and group capture. This value corresponds to the output table column GroupList.

When SQL Server uses the RegexEnumerator, it first calls MoveNext() and then uses the Current property.

Next, execution passes to the method specified in FillRowMethodName (FillMatchAll()).

Finally, the CLR passes the latest value of _current to FillMatchAll() as the row parameter. Each out parameter of FillMatchAll() is set to the value for the columns in the output row.

Note

If this implementation seems daunting, the best way to overcome that is to walk though the function line by line in debug mode, using VS.